Emmanuel Macron

C'est marrant à quel point l'iconographie du site d'Emmanuel est soignée. Peut-on en faire une gallerie d'images?

On part de cette url : https://en-marche.fr/emmanuel-macron/le-programme



In [2]:

    
from bs4 import BeautifulSoup
import requests



In [3]:

    
r = requests.get('https://en-marche.fr/emmanuel-macron/le-programme')



In [4]:

    
soup = BeautifulSoup(r.text, 'html.parser')



In [5]:

    
proposals = soup.find_all(class_='programme__proposal')



In [6]:

    
proposals = [p for p in proposals if 'programme__proposal--category' not in p.attrs['class']]



In [7]:

    
len(proposals)









    Out[7]:





36



In [8]:

    
p = proposals[0]



In [9]:

    
full_url = 'https://en-marche.fr' + p.find('a').attrs['href']
full_url









    Out[9]:





'https://en-marche.fr/emmanuel-macron/le-programme/action-publique-fonction-publique'



In [10]:

    
full_urls = ['https://en-marche.fr' + p.find('a').attrs['href'] for p in proposals]



In [11]:

    
full_urls[:10]









    Out[11]:





['https://en-marche.fr/emmanuel-macron/le-programme/action-publique-fonction-publique',
 'https://en-marche.fr/emmanuel-macron/le-programme/agriculture',
 'https://en-marche.fr/emmanuel-macron/le-programme/culture',
 'https://en-marche.fr/emmanuel-macron/le-programme/defense',
 'https://en-marche.fr/emmanuel-macron/le-programme/dependance',
 'https://en-marche.fr/emmanuel-macron/le-programme/dialogue-social',
 'https://en-marche.fr/emmanuel-macron/le-programme/education',
 'https://en-marche.fr/emmanuel-macron/le-programme/%C3%A9galit%C3%A9-hommes-et-femmes',
 'https://en-marche.fr/emmanuel-macron/le-programme/emploi-ch%C3%B4mage-securites-professionnelles',
 'https://en-marche.fr/emmanuel-macron/le-programme/enseignement-superieur-recherche']



In [12]:

    
r = requests.get(full_url)
soup = BeautifulSoup(r.text, 'html.parser')



In [13]:

    
figure_tag = soup.find('figure', class_='fullscreen')
figure_tag









    Out[13]:





<figure class="fullscreen">
<img alt="01-fonction-publique-hospital-sante-emmanuel-macron-en-marche" src="/assets/images/01-fonction-publique-hospital-sante-emmanuel-macron-en-marche?q=70&amp;cache=e7d04db7e6ec8a188aee&amp;fm=pjpg&amp;s=97b9c84c57c417dcef72c4919e6f2625" title="Action publique / Fonction publique"/>
</figure>

On peut maintenant extraire le lien vers l'image.



In [14]:

    
src_url = 'https://en-marche.fr' + figure_tag('img')[0].attrs['src']
src_url









    Out[14]:





'https://en-marche.fr/assets/images/01-fonction-publique-hospital-sante-emmanuel-macron-en-marche?q=70&cache=e7d04db7e6ec8a188aee&fm=pjpg&s=97b9c84c57c417dcef72c4919e6f2625'

On peut afficher ceci dans le notebook.



In [15]:

    
from IPython.display import Image



In [16]:

    
Image(url=src_url)









    Out[16]:



In [17]:

    
def extract_img_src(url):
    "Extracts image src url from linked page."
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    figure_tag = soup.find('figure', class_='fullscreen')
    if figure_tag is not None and figure_tag('img') is not None:
        src_url = 'https://en-marche.fr' + figure_tag('img')[0].attrs['src']
        return src_url
    else:
        print("no image for url: {}".format(url))
        return None

On peut répeter ce processus et faire une gallerie avec toutes ces images.



In [18]:

    
srcs = [extract_img_src(url) for url in full_urls]









    



no image for url: https://en-marche.fr/emmanuel-macron/le-programme/familles-et-societe
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/handicap
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/immigration-et-asile
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/justice
no image for url: https://en-marche.fr/emmanuel-macron/le-programme/pauvrete



In [19]:

    
srcs = [_ for _ in srcs if _ is not None]



In [20]:

    
header = """<!doctype html>
<html lang="fr">
<head>
  <meta charset="utf-8">
  <title>Gallerie des photos du site d'Emmanuel Macron</title>
  <style>
  img {width: 100%;}
  </style>
</head>"""



In [22]:

    
def format_as_img_tag(src):
    return "<img src={} />".format(src)



In [23]:

    
format_as_img_tag(srcs[2])









    Out[23]:





'<img src=https://en-marche.fr/assets/images/04-culture-musee-exposition-guadeloupe-emmanuel-macron?q=70&cache=0f9e2f1675c10ef5c67b&fm=pjpg&s=92575e4c67cd6a07095acfd08652efb6 />'



In [24]:

    
with open('galerie_macron.html', 'w') as f:
    body = """<body>
{0}
</body>""".format("\n".join(format_as_img_tag(url) for url in srcs))
    html = header + body + "</html>"
    f.write(html)

Ce sont des belles photos...

François Fillon

Depuis la sortie du programme de François Fillon, on peut répéter la démarche.



In [35]:

    
r = requests.get('https://www.fillon2017.fr/projet/')
soup = BeautifulSoup(r.text, 'html.parser')



In [36]:

    
tags = soup.find_all('a', class_='projectItem__inner')



In [37]:

    
sublinks = [tag.attrs['href'] for tag in tags]

On s'attaque aux pages individuelles.



In [39]:

    
sublinks[0]









    Out[39]:





'https://www.fillon2017.fr/projet/competitivite/'



In [38]:

    
r = requests.get(sublinks[0])
soup = BeautifulSoup(r.text, 'html.parser')



In [48]:

    
src = soup.find('div', class_='singleProject__banner bannerWithMask backgroundCover').attrs['style'].split("background-image: url(")[1][1:-3]



In [49]:

    
def extract_img_src(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser') 
    src = soup.find('div', class_='singleProject__banner bannerWithMask backgroundCover').attrs['style'].split("background-image: url(")[1][1:-3]
    return src



In [51]:

    
srcs = [extract_img_src(url) for url in sublinks]



In [52]:

    
srcs









    Out[52]:





['https://www.fillon2017.fr/wp-content/uploads/2016/01/DSCF7108.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FRANCOIS_FILLON_LIMOUSIN_0558-1024x457.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/allocation_sociale_unique-1024x509.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/01/DSCF5325.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0849-1024x478.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_7587-1024x416.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7085-1024x462.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/fonction_publique-1024x451.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/06/DSC_1847.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1579-1024x470.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/05/femmes-1-1024x314.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7280-1024x411.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/1234432974.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/defense-1024x471.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1232-1024x523.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/enseignement_recherche-1024x474.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FRANCOIS_FILLON_LIMOUSIN_0327-1-1024x458.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/chasse_ff-1024x530.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF7239-1024x420.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0926-1024x465.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/03/DSCF5040.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/pouv_achat-1024x494.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1341-1024x519.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/04/IMG_0353.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/reforme_etat-1024x388.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_8849-1024x576.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1135-1024x445.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/image1-1024x444.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/FD_2439_2-1024x411.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/13062320_10154143009027533_3114262831415445150_n.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2016/11/Capture-d’écran-2017-02-23-à-00.05.53.png',
 'https://www.fillon2017.fr/wp-content/uploads/2016/01/Capture-d’écran-2017-02-23-à-10.29.24.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_1742-1024x456.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_0952-1024x434.png',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/IMG_9977-1024x450.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2015/11/IMG_8838.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/politique_ville-1024x478.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/DSCF5048-1024x390.jpg',
 'https://www.fillon2017.fr/wp-content/uploads/2017/03/etranger-1024x427.jpg']



In [53]:

    
with open('galerie_fillon.html', 'w') as f:
    body = """<body>
{0}
</body>""".format("\n".join(format_as_img_tag(url) for url in srcs))
    html = header + body + "</html>"
    f.write(html)



In [ ]: